Combining Rule-Based and Case-Based Learning for Iterative Part-of-Speech Tagging

نویسندگان

  • Alneu de Andrade Lopes
  • Alípio Mário Jorge
چکیده

In this article we show how the accuracy of a rule based first order theory may be increased by combining it with a case-based approach in a classification task. Case-based learning is used when the rule language bias is exhausted. This is achieved in an iterative approach. In each iteration theories consisting of first order rules are induced and covered examples are removed. The process stops when it is no longer possible to find rules with satisfactory quality. The remaining examples are then handled as cases. The case-based approach proposed here is also, to a large extent, new. Instead of only storing the cases as provided, it has a learning phase where, for each case, it constructs and stores a set of explanations with support and confidence above given thresholds. These explanations have different levels of generality and the maximally specific one corresponds to the case itself. The same case may have different explanations representing different perspectives of the case. Therefore, to classify a new case, it looks for relevant stored explanations applicable to the new case. The different possible views of the case given by the explanations correspond to considering different sets of conditions/features to analyze the case. In other words, they lead to different ways to compute similarity between known cases/explanations and the new case to be classified (as opposed to the commonly used global metric). Experimental results have been obtained on a corpus of Portuguese texts for the task of part-of-speech tagging with significant improvement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging

In this paper we describe an unsupervised learning algorithm for automatically training a rule-based part of speech tagger without using a manually tagged corpus. We compare this algorithm to the Baum-Welch algorithm, used for unsupervised training of stochastic taggers. Next, we show a method for combining unsupervised and supervised rule-based training algorithms to create a highly accurate t...

متن کامل

Part-of-Speech (POS) Tagging Revisited

Accurate part-of-speech (POS) tagging of natural language text data can add power to automated information retrieval and extraction. Brill's transformation-based learning (TBL) approach to automated POS tagging was introduced in 1992, combining virtues of rule-based and stochastic methods. Brill's innovative idea was to use machine learning techniques to search through all of rule space for the...

متن کامل

Part of Speech (POS) Tagger for Kokborok

The Part of Speech (POS) tagging refers to the process of assigning appropriate lexical category to individual word in a sentence of a natural language. This paper describes the development of a POS tagger using rule based and supervised methods in Kokborok, a resource constrained and less computerized Indian language. In case of rule based POS tagging, we took the help of a morphological analy...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

برچسب‌گذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی

Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000